Skip to content

LocationCache: Fixes read fallback to use WriteEndpoints[0] when PPAF enabled and all regions excluded#5823

Merged
ananth7592 merged 3 commits intomainfrom
users/ananth/5821
May 4, 2026
Merged

LocationCache: Fixes read fallback to use WriteEndpoints[0] when PPAF enabled and all regions excluded#5823
ananth7592 merged 3 commits intomainfrom
users/ananth/5821

Conversation

@ananth7592
Copy link
Copy Markdown
Member

@ananth7592 ananth7592 commented Apr 30, 2026

Problem

When ApplicationPreferredRegions == ExcludeRegions, LocationCache.GetApplicableEndpoints falls back to his.defaultEndpoint — a static, region-agnostic URI set once at init and never updated. After a write region (hub) switch, the GlobalAddressResolver's cached EndpointCache for this default endpoint has a stale AddressResolver.location, causing incorrect region tracking in diagnostics, per-partition routing, and retry logic.

Fix

When PPAF (IsPartitionLevelFailoverEnabled) is enabled, GetApplicableEndpoints now uses WriteEndpoints[0] (dynamic, tracks current write region) as the read fallback instead of his.defaultEndpoint.

This aligns with:

  • UpdateLocationCache (L756-760) which already uses WriteEndpoints[0] for ReadEndpoints fallback
  • Java SDK: writeRegionalRoutingContexts.get(0)
  • Python SDK: get_write_regional_routing_contexts()[0]

PPAF Gating

The fix is gated behind Func isPartitionLevelFailoverEnabled wired from ConnectionPolicy.EnablePartitionLevelFailover through GlobalEndpointManager, supporting dynamic enablement per PR #5310. When PPAF is disabled, original behavior (defaultEndpoint fallback) is preserved.

Changes

  • LocationCache.cs: Added isPartitionLevelFailoverEnabled parameter; gated read fallback behind it
  • GlobalEndpointManager.cs: Wires ConnectionPolicy.EnablePartitionLevelFailover into LocationCache
  • LocationCacheTests.cs: 3 new tests covering PPAF on/off/dynamic toggle scenarios

Testing

All 94 LocationCacheTests pass.

Fixes #5821

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines:
Successfully started running 1 pipeline(s).

Comment thread Microsoft.Azure.Cosmos/src/Routing/LocationCache.cs
…ind PPAF

When ExcludeRegions filters out all preferred read regions and PPAF
(Partition Level Failover) is enabled, GetApplicableEndpoints now falls back
to WriteEndpoints[0] (dynamic, tracks current write region) instead of
this.defaultEndpoint (static, region-agnostic URI set once at init).

The fix is gated behind isPartitionLevelFailoverEnabled (Func<bool>) wired
from ConnectionPolicy.EnablePartitionLevelFailover through GlobalEndpointManager,
supporting dynamic enablement per PR #5310.

When PPAF is disabled, original behavior (defaultEndpoint fallback) is preserved.

Fixes #5821

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@ananth7592 ananth7592 force-pushed the users/ananth/5821 branch from 8b6cfce to 038e757 Compare May 1, 2026 16:59
@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines:
Successfully started running 1 pipeline(s).

@kundadebdatta
Copy link
Copy Markdown
Member

Shall we also update the verbaige in the RequestOptions.ExcludeRegions property to set the correct expectation here ?

Comment thread Microsoft.Azure.Cosmos/src/Routing/LocationCache.cs
…oint

Modified the Verbiage for RequestOptions.ExcludeRegions to reflect the best-effort intent and to disambiguate the primary/hub verbiage.

Validates that when GlobalPartitionEndpointManagerCore sets
LocationEndpointToRoute (partition-level failover override),
ResolveServiceEndpoint returns it directly at L341, bypassing
ExcludeRegions filtering entirely. This proves no additional
PPAF condition is needed in the GetApplicableEndpoints fallback.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines:
Successfully started running 1 pipeline(s).

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines:
Successfully started running 1 pipeline(s).

Comment thread Microsoft.Azure.Cosmos/src/Routing/LocationCache.cs
Comment thread Microsoft.Azure.Cosmos/src/RequestOptions/RequestOptions.cs
Copy link
Copy Markdown
Member

@kushagraThapar kushagraThapar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @ananth7592

@ananth7592 ananth7592 enabled auto-merge (squash) May 4, 2026 19:11
@ananth7592 ananth7592 merged commit 9939bf4 into main May 4, 2026
32 checks passed
@ananth7592 ananth7592 deleted the users/ananth/5821 branch May 4, 2026 19:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Reads routed via defaultEndpoint do not failover after write region switch when ExcludeRegions filters all preferred regions

4 participants